Refactor: convert ia-it nested loops to iat flat loops with OpenMP in ESolver_DP#7394
Refactor: convert ia-it nested loops to iat flat loops with OpenMP in ESolver_DP#7394chengleizheng wants to merge 5 commits into
Conversation
…t2ia lookup arrays and added #pragma omp parallel for guarded by #ifdef _OPENMP in runner() coord building, runner() force assignment, and type_map() atype assignment.
|
Nice try, you can do more, and put your test and analysis here. |
| #ifdef _OPENMP | ||
| #pragma omp parallel for | ||
| #endif | ||
| for (int iat = 0; iat < ucell.nat; ++iat) |
There was a problem hiding this comment.
I recommend default(none) because it requires explicit variable scoping and avoids hidden parallel errors.
There was a problem hiding this comment.
Thanks for the recommendation!😊
二、代码修改详情修改总览
2.1
|
| 指标 | 优化前 | 优化后 | 变化 |
|---|---|---|---|
| 总时间 | 10.23 s | 6.28 s | -38.6% |
| Run_MD md_line | 9.14 s | 5.71 s | -37.5% |
| ESolver_DP runner | 2.60 s | 2.55 s | -1.9% |
| runner 单步平均 | 0.260 s | 0.255 s | -1.9% |
分析:
ESolver_DP::runner自身的耗时小幅下降(2.60s → 2.55s),因为坐标构建和力赋值的循环在 864 原子规模下本身开销有限- 总时间大幅缩短(10.23s → 6.28s)的主要原因:启用 OpenMP 后 MKL/BLAS 等数学库自动受益于多线程。
|
Nice, could you try different sizes of systems? |
Thanks for the suggestion! I've run benchmark tests across different thread counts after adding Test Parameters
Total: 4 test groups × 6 thread counts = 24 independent runs
Baseline (Before Optimization)
Optimized Version (After Optimization)
The total runtime is significantly reduced, and the optimization effect is most prominent in the 100-step test. Specifically, the total time drops from 48.66 s to 33.84 s with 1 thread (~30.5% reduction), and from 45.27 s to 42.21 s with 16 threads (~6.8% reduction). |
Replaced ia-it nested loops with flat iat loops using ucell.iat2it/iat2ia lookup arrays and added #pragma omp parallel for guarded by #ifdef _OPENMP in runner() coord building, runner() force assignment, and type_map() atype assignment.